In this project, I have used the “cars_multi” and “cars_price” datasets. I have tried to understand how the varibles of these datasets are relate to each other, to uncover interesting things, and to communicate those findings. I’m going to focus on the correlation between mpg and the other properties.
I am going to use the following R libraries to assist in my analysis:
library(ggplot2) require(GGally) require(ggthemes) require(plotly) require(dplyr) require(heatmaply) require(ggcorrplot)
library(ggplot2)
require(GGally)
## Loading required package: GGally
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
require(ggthemes)
## Loading required package: ggthemes
require(plotly)
## Loading required package: plotly
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(heatmaply)
## Loading required package: heatmaply
## Warning: package 'heatmaply' was built under R version 4.0.5
## Loading required package: viridis
## Loading required package: viridisLite
##
## ======================
## Welcome to heatmaply version 1.2.1
##
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
##
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## Or contact: <tal.galili@gmail.com>
## ======================
require(ggcorrplot)
## Loading required package: ggcorrplot
## Warning: package 'ggcorrplot' was built under R version 4.0.5
cars_multi <- read.csv("cars_multi.csv")
cars_price <- read.csv("cars_price.csv")
cars <- left_join(cars_multi, cars_price, by="ID")
model_years = sort(unique(cars$model))
cars$model = cars$model %>%
factor(labels = model_years)
origins <- c('USA', 'Europe', 'Japan')
cars$origin <- factor(cars$origin, labels = origins)
str(cars)
## 'data.frame': 398 obs. of 11 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ mpg : num 18 15 18 16 17 15 14 14 14 15 ...
## $ cylinders : int 8 8 8 8 8 8 8 8 8 8 ...
## $ displacement: num 307 350 318 304 302 429 454 440 455 390 ...
## $ horsepower : chr "130" "165" "150" "150" ...
## $ weight : int 3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
## $ acceleration: num 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
## $ model : Factor w/ 13 levels "70","71","72",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ origin : Factor w/ 3 levels "USA","Europe",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ car_name : chr "chevrolet chevelle malibu" "buick skylark 320" "plymouth satellite" "amc rebel sst" ...
## $ price : num 25562 24221 27241 33685 20000 ...
cars$horsepower <- as.numeric(cars$horsepower)
## Warning: NAs introduced by coercion
summary(cars)
## ID mpg cylinders displacement
## Min. : 1.0 Min. : 9.00 Min. :3.000 Min. : 68.0
## 1st Qu.:100.2 1st Qu.:17.50 1st Qu.:4.000 1st Qu.:104.2
## Median :199.5 Median :23.00 Median :4.000 Median :148.5
## Mean :199.5 Mean :23.51 Mean :5.455 Mean :193.4
## 3rd Qu.:298.8 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:262.0
## Max. :398.0 Max. :46.60 Max. :8.000 Max. :455.0
##
## horsepower weight acceleration model origin
## Min. : 46.0 Min. :1613 Min. : 8.00 73 : 40 USA :249
## 1st Qu.: 75.0 1st Qu.:2224 1st Qu.:13.82 78 : 36 Europe: 70
## Median : 93.5 Median :2804 Median :15.50 76 : 34 Japan : 79
## Mean :104.5 Mean :2970 Mean :15.57 82 : 31
## 3rd Qu.:126.0 3rd Qu.:3608 3rd Qu.:17.18 75 : 30
## Max. :230.0 Max. :5140 Max. :24.80 70 : 29
## NA's :6 (Other):198
## car_name price
## Length:398 Min. : 1598
## Class :character 1st Qu.:23110
## Mode :character Median :30000
## Mean :29684
## 3rd Qu.:36430
## Max. :53746
##
##Univariate Plots- weight
The most common weight is something between 2000 and 3000, and there only one unique weight for the majority of the cars
##Correlation
At this plot we can see the correlation between all features.
This section includes charts that involve three or more variables simultaneously, to give us a more complete look at the questions that presented themselves in the previous sections. Building on the observation in the previous plot, I want to see how each region’s product mix has evolved over time. The best way to illustrate this is with a stacked bar chart over time for each region.
As shown in the top section, while the number of four-cylinder cars increases over time, six- and eight-cylinder cars comprise the majority of the United States’ product mix until 1980. Europe and Japan almost exclusively produce four-cylinder cars with just a few exceptions over the entire 13-year period. We can see this phenomenon illustrated when we compare each region’s weight distributions per year using boxplots.
As we can see, US cars show much higher average weights than Europe and, especially, Japan, until about 1980, when US weight distribution comes down considerably. From above we know that 1980 is when the US converted to a higher percentage of four-cylinder cars. Note that average weights stay more constant for Europe and Japan over the same time period.
Now we can create a similar comparative boxplot for MPG over time.
The average MPG for each region shows an upward trend, especially towards the end of the 70’s and into the early 80’s. Since Europe and Japan also increased MPG, it is apparent that increasing overall fuel economy was not solely about changing the product mix away from six- and eight-cylinder cars. Indeed, the fuel economy of four-cylinder cars increased over time. We can see that more clearly by restricting our analysis to include only four-cylinder cars.
The weight of a car is a strong determinant of its fuel-efficiency, as expressed by MPG. Four-cylinder cars are the lightest, and eight-cylinder cars are the heaviest. Therefore, four-cylinder cars get the best gas mileage.